image url
WAON: Large-Scale and High-Quality Japanese Image-Text Pair Dataset for Vision-Language Models
Sugiura, Issa, Kurita, Shuhei, Oda, Yusuke, Kawahara, Daisuke, Okabe, Yasuo, Okazaki, Naoaki
Large-scale and high-quality image-text pair datasets play an important role in developing high-performing Vision-Language Models (VLMs). In this work, we introduce WAON, a large-scale and high-quality Japanese image-text pair dataset containing approximately 155 million examples, collected from Common Crawl. Our dataset construction pipeline employs various techniques, including filtering and deduplication, which have been shown to be effective in previous studies. To evaluate its effectiveness, we also construct WAON-Bench, a manually curated benchmark for Japanese cultural image classification, consisting of 374 classes. To assess the effectiveness of our dataset, we conduct experiments using both WAON and the Japanese subset of ReLAION, one of the most widely used vision-language datasets. We fine-tune SigLIP2, a strong multilingual model, on both datasets. The results demonstrate that WAON enhances model performance on WAON-Bench more efficiently than ReLAION and achieves higher accuracy across all evaluated benchmarks. Furthermore, the model fine-tuned on WAON achieves state-of-the-art performance on several Japanese cultural benchmarks. We release our dataset, model, and code at https://speed1313.github.io/WAON.
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Midjourney Mastery: A Guide to Using Image Prompts - Metaroids
Midjourney is an AI tool that leads to the door of limitless creativity, transforming human imagination into visual art. Only this time, we'll unlock it not with words but with IMAGES. An image can be used as a prompt on Midjourney, serving as a reference to the art it will generate. All you have to do is combine them with different photos, texts, and other elements; you name it. You can even get more creative outputs with a bit of thinking outside the box. Don't worry; it'll be an easy task.
Computer vision API- Skyl.ai
Computer vision APIs let you run computer vision tasks programmatically at scale in real time. Once set up, the computer vision API can run computer vision tasks simultaneously on millions of data. This makes it easy to integrate these APIs into your apps or websites and deliver cutting edge computer vision backed experiences to your customers easily. For example, you might have a reverse image search engine which takes in a photo as an input and returns a set of similar images from the web. You can implement this in no time using computer vision APIs even though you do not have any expertise in machine learning or computer vision.
How to Generate Text from Images with Python
In the Google Search: State of the Union last May, John Mueller and Martin Splitt spent about a fourth of the address to image-related topics. They announced a big list of improvements to Google Image Search and predicted that it would be a massive untapped opportunity for SEO. SEO Clarity, an SEO tool vendor, released a very interesting report around the same time. Among other findings, they found that more than a third of web search results include images. Images are important to search visitors not only because they are visually more attractive than text, but they also convey context instantly that would require a lot more time when reading text.
Large-Scale Serverless Machine Learning Inference with Azure Functions
This article is part of #ServerlessSeptember. You'll find other helpful articles, detailed tutorials, and videos in this all-things-Serverless content collection. New articles are published every day -- that's right, every day -- from community members and cloud advocates in the month of September. Azure Functions recently announced the general availability of their Python language support. We can use Python 3.6 and Python's large ecosystem of packages, such as TensorFlow, to build serverless functions. Today, we'll look at how we can use TensorFlow with Python Azure Functions to perform large-scale machine learning inference.